17 research outputs found

    Low Complexity Interpolation Filters for Motion Estimation and Application to the H.264 Encoders

    Get PDF
    Techniques for image super-resolution play an important role in a plethora of applications, which include video compression and motion estimation. The detection of the fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Researchers have performed timing analysis for the motion estimation process and they reported that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60-90% of the total encoding time depending on the design configuration

    Programmable Motion Estimation Architecture

    No full text
    Abstract-This paper presents a real-time Motion Estimation architecture with improved hardware cost. The design bases on a parallel memory organization minimizing the resources required to support integer and sub-pixel modes of search, while it sustains the required throughput of pixels to the SAD calculator. A speculative execution technique improves the number of cycles required by the search process. The architecture is programmable including an instruction set for actions common to all the block-matching techniques, while circuits introduced at compile time accommodate individual actions of the most demanding algorithms such as MVFAST and PMVFAST. A FPGA implementation validates HDTV video performance

    A Co-Design Approach For Rapid Prototyping Of Image Processing On SoC FPGAs

    No full text
    Achieving real-time performance in image processing with embedded devices poses a very challenging task due to the computationally and memory intensive nature of the algorithms. The FPGA platforms provide very attractive solutions in such applications, because they support highly parallel processing with low power consumption. In this paper we present an approach to increase productivity when developing real-time image processing algorithms on SoC FPGA devices. Our approach is centered around the fast communication of the HW and SW components and the use of an open-source operating system hosted on the existing embedded processor. Based on this approach we decrease time-to-market while at the same time we avoid hindering the real-time operation of the system. To demonstrate the capabilities of the proposed system, as a proof of concept, we use the well known Harris detection algorithm and the Xilinx Zynq XC7Z020 FPGA device. We present an in-depth performance analysis regarding the resource utilization of the FPGA, the operation frequency, the communication overhead and the power consumption

    Design Space Exploration on High-Order QAM Demodulation Circuits: Algorithms, Arithmetic and Approximation Techniques

    No full text
    Every new generation of wireless communication standard aims to improve the overall performance and quality of service (QoS), compared to the previous generations. Increased data rates, numbers and capabilities of connected devices, new applications, and higher data volume transfers are some of the key parameters that are of interest. To satisfy these increased requirements, the synergy between wireless technologies and optical transport will dominate the 5G network topologies. This work focuses on a fundamental digital function in an orthogonal frequency-division multiplexing (OFDM) baseband transceiver architecture and aims at improving the throughput and circuit complexity of this function. Specifically, we consider the high-order QAM demodulation and apply approximation techniques to achieve our goals. We adopt approximate computing as a design strategy to exploit the error resiliency of the QAM function and deliver significant gains in terms of critical performance metrics. Particularly, we take into consideration and explore four demodulation algorithms and develop accurate floating- and fixed-point circuits in VHDL. In addition, we further explore the effects of introducing approximate arithmetic components. For our test case, we consider 64-QAM demodulators, and the results suggest that the most promising design provides bit error rates (BER) ranging from 10−1 to 10−4 for SNR 0–14 dB in terms of accuracy. Targeting a Xilinx Zynq Ultrascale+ ZCU106 (XCZU7EV) FPGA device, the approximate circuits achieve up to 98% reduction in LUT utilization, compared to the accurate floating-point model of the same algorithm, and up to a 122% increase in operating frequency. In terms of power consumption, our most efficient circuit configurations consume 0.6–1.1 W when operating at their maximum clock frequency. Our results show that if the objective is to achieve high accuracy in terms of BER, the prevailing solution is the approximate LLR algorithm configured with fixed-point arithmetic and 8-bit truncation, providing 81% decrease in LUTs and 13% increase in frequency and sustains a throughput of 323 Msamples/s

    Design and Comparison of FFT VLSI Architectures for SoC Telecom Applications with Different Flexibility, Speed and Complexity Trade-Offs

    No full text
    The design of Fast Fourier Transform (FFT) integrated architectures for System-on-Chip (SoC) telecom applications is addressed in this paper. After reviewing the FFT processing requirements of wireless and wired Orthogonal Frequency Division Multiplexing (OFDM) standards, including the emerging Multiple Input Multiple Output (MIMO) and OFDM Access (OFDMA) schemes, three FFT architectures are proposed: a fully parallel, a pipelined cascade and an in-place variable-size architecture, which offer different trade-offs among flexibility, processing speed and complexity. Silicon implementation results and comparisons with the state-of-the-art prove that each macrocell outperforms the known works for a target application. The fully parallel is optimized for throughput requirements up to several GSamples/s enabling Ultra-wideband (UWB) communications by using all channels foreseen in the standard. The pipelined cascade macrocell minimizes complexity for large size FFTs sustaining throughput up to 100 MSamples/s. The in-place variable-size FFT macrocell stands for its flexibility by allowing run-time reconfigurability required in OFDMA schemes while attaining the required throughput to support MIMO communications. The three architectures are also compared with common case-studies and target technology

    A 56 Gbaud Reconfigurable FPGA Feed-Forward Equalizer for Optical Datacenter Networks with flexible Baudrate- and Modulation-Format

    No full text
    The staggering growth of datacenter traffic has spurred the rapid uptake of advanced modulation-formats to increase throughput. Commodity optoelectronic components are used for cost-efficiency, assisted with digital equalizers to mitigate their bandwidth limitations. With optically-switched datacenter architectures gaining momentum, reconfigurable equalizers are sought allowing the receiver to adapt to different fiber lengths, bitrates and modulation-formats associated to different optical paths. An FPGA-based feed-forward equalizer (FFE) reconfigurable in baudrate and modulation-format is demonstrated. We verify its performance with NRZ and PAM-4 experimental data up to 56 GBaud, investigate its accuracy and extract the optimum FFE implementation for different transmission scenarios
    corecore